- 
                Notifications
    You must be signed in to change notification settings 
- Fork 144
openvla policy intergration #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Thanks for the contribution! Could you post the success rates for each task? Could you post a source for the implementation of  Code comments: 
 | 
| Hi xuanlinli17, I have corrected all the typos. The implementation of  | 
| I see. Might want to get help from the official authors to validate / revise the implementation as the Bridge results are near zero for some reason, and pick coke can has large variance across different backgrounds. Additionally, it's possible that OpenVLA might not follow Octo implementation in real deployment. | 
| Also you can modify https://github.com/simpler-env/SimplerEnv/blob/main/tools/calc_metrics_evaluation_videos.py to quickly summarize the results for OpenVLA (just put dummy numbers for the real numbers and don't push the script). You can ignore the nans. | 
| I'm also working on implementing OpenVLA into SimplerEnv, and I had the same issue: OpenVLA fails drastically on Bridge. I wonder if that has anything to do with the controller mentioned in #11 | 
| Same here, Severe lack of performance of OpenVLA on WindowX robot. | 
| I checked with the authors and I don't think there is action ensembling or action history. Here is the updated code, which you can try:  | 
| OpenVLA setup requirements: Please add these instructions to Readme and add an "OpenVLA Inference Setup" section | 
| Typos in  | 
| 
 Hello @xuanlinli17, I tried the code above but openvla still fails on widowx tasks. Is it possibly an implementation problem? Should I set any params for widowx? | 
| Yeah that's my finding too, but I don't think the authors did some special treatments to evaluate OpenVLA on Bridge. There might be some coordinate transforms on Bridge which is different. | 
| @xuanlinli17 Thank you! I will continue to look into this. If I find any additional information or solutions, I'll make sure to share it with you. | 
| When I run the scripts  How to fix it? | 
| ^ Run  | 
| I have made a run, here is the full result of vla: I use my branch as codebase: https://github.com/hilookas/SimplerEnv (note: the "real" result is set to 0)  | 
| For Google Robot pick coke can, looks like the variant aggregation eval of OpenVLA is a lot better than visual matching, which is interesting...  | 
| @hilookas Thanks for providing the results! | 
| @hilookas Thank you for your great work! I want to try out the openVLA in the simulator as well. But I wonder why the performance in the sim is not as good as what is claimed by the paper in the real-world benchmark. It should not be due to the sim-to-real gap, right? Cause SIMPLER is designed to mitigate this gap. | 
| @QuanyiLi OpenVLA did 5 trials in real for each task (and there's no grid-based evals with >= 50 trials per task for Google Robot like in Simpler). Task settings like the cabinets and backgrounds used in the real world can also be different. We are requesting paired sim-real evaluation from Google following Simpler's protocol (and the same backgrounds, cabinets, etc). | 
| 
 Thanks. Look forward to the updated results! | 
| 
 Sure! I have update the result. Please see log above. My result is slightly different but not much from xuanlinli17's run in 
 | 
| How much memory is needed to run OpenVLA? I tried 3090 and 40GB A100 but both go out of memory. | 
| @yxchng It takes me 15G vram for 4090, following the official instructions to use bf16. | 
| 
 1x3090 is enough. Just remember don't open env and inference process more than 1. | 
| 
 @hilookas Would like to know where the tables come from? I did not see them in the OpenVLA paper | 
| 
 I made it :D Based on my experiment above. If you have another run result, please let me know! | 
| 
 I do not have enough GPU resources attached to a Screen the Sapien simulator requires. :-(, running OpenVLA locally is quite a burden for most consumer-level PC. | 
| ^ I think you need a local cuda version >= 11.6, probably match your torch version. | 
| When I run  My computer is GTX 4090 and Ubuntu. | 
| hello, I'm curious why you didn't add 'In: What action should the robot take to {INSTRUCTION}?\nOut:' when inputting prompts to the processor? Would adding or not adding this sentence affect the results? | 
| Hi guys, I would like to ask about is it inadvisable to evaluate OpenVLA on the SIMPLER simulation platform, given that Widox performed very poorly on the Bridge dataset? Is there a better method to reproduce the results presented in the OpenVLA paper? | 
| 
 OpenVLA didn't apply any augmentation during training, compared to other models like RT-*. This could explain why they performed poorly on Simpler on WidowX (even though their result on Google Robot is fine). | 
| Hi, any update on bridge tasks? | 
| Hi, I also wanted to update that OpenVLA performance on Bridge tasks is drastically low. The one setting it did have ~10% success rate was when ray-tracing was unabled. This was for the Put Carrot on Plate task, but for a toy kitchen background. Still, it was only for one carrot position. Other carrot positions, it failed a lot. | 
| 
 @xuanlinli17 I'm not sure that the coordinate transforms are different between bridge and OpenVLA action output. Can anyone else confirm this? At least on hardware, we say good performance of OpenVLA for Put Carrot on Plate (~70%). Is there any update on this issue? | 
| If it works on hardware then it should be fine, as there have been quite a few papers published after OpenVLA that demonstrate robustness to Simpler visuals on Bridge. | 
Fix openvla loading from saved_model_path
Update model name








This pull request integrates openvla policy. The evaluation scripts remain consistent with the original repo under ./scripts/